Application of image processing methods to filled pauses detection from spontaneous speech
نویسندگان
چکیده
To obtain a more human-like interaction with technical systems, those have to be adaptable to the users individual preferences, and current emotional state. In human-human interaction the behaviour of the speaker is characterised by semantic and prosodic cues, given (among other indicators) as short feedback signals. These so called filled pauses minimally convey certain dialogue functions such as attention, understanding, confirmation, or other attitudinal reactions. These signals play a valuable role in the progress and coordination of interaction. Hereby, the first step enabling an automatic system to react on these signals is the detection of them within the users utterances. This is a quite complex task, as the filled pauses are phonetically short, consisting mostly only of one vowel and one consonant. In this paper we present our methods to detect filled pauses in a naturalistic interaction utilising the LAST MINUTE corpus. We used an SVM classifier and improved the results further, by applying a Gaussian filter to infer temporal context information and performing a morphological opening to filter false alarms. We obtained recall of 70%, precision of 55%, and AUC of 0.94.
منابع مشابه
Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “h...
متن کاملDetection of filled pauses in spontaneous conversational speech
Most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. A primary difference between read speech and spontaneous speech concerns a high rate of disfluencies (e.g., filled pauses, repetitions, repairs, false starts). Filled pauses (e.g., “uh,” “um”), unlike silences, resemble phones as part of word...
متن کاملAutomatic Detection and Removal of Disfluencies from Spontaneous Speech
Unlike rehearsed and prepared speech, spontaneous speech contains high occurrence of disfluencies, like repetitions, filled pauses, and hesitations. Disfluencies can seriously hamper the word recognition accuracy of an Automatic Speech Recogniser (ASR), by increasing word insertion and deletion and rejection rates. In this paper we introduce signal processing algorithms to automatically identif...
متن کاملPauses in Deceptive Speech
We use a corpus of spontaneous interview speech to investigate the relationship between the distributional and prosodic characteristics of silent and filled pauses and the intent of an interviewee to deceive an interviewer. Our data suggest that the use of pauses correlates more with truthful than with deceptive speech, and that prosodic features extracted from filled pauses themselves as well ...
متن کاملA Real-time System Detecting Filled Pauses in Spontaneous Speech
This paper describes a method for detecting filled pauses (including word lengthening), which are one of the hesitation phenomena. This detection is important in speech dialogue systems because they play valuable roles in oral communication. Although there have been a few previous speech recognition systems handling filled pauses, they have not detected them individually and consequently could ...
متن کامل